ITSA * : An Effective Iterative Method for Short-Text Clustering Tasks

نویسندگان

Marcelo Luis Errecalde

Diego Ingaramo

Paolo Rosso

چکیده

The current tendency for people to use very short documents, e.g. blogs, text-messaging, news and others, has produced an increasing interest in automatic processing techniques which are able to deal with documents with these characteristics. In this context, “short-text clustering” is a very important research field where new clustering algorithms have been recently proposed to deal with this difficult problem. In this work, ITSA , an iterative method based on the bio-inspired method PAntSA is proposed for this task. ITSA takes as input the results obtained by arbitrary clustering algorithms and refines them by iteratively using the PAntSA algorithm. The proposal shows an interesting improvement in the results obtained with different algorithms on several short-text collections. However, ITSA can not only be used as an effective improvement method. Using random initial clusterings, ITSA outperforms well-known clustering algorithms in most of the experimental instances.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

An Evaluation on Feature Selection for Text Clustering

Feature selection methods have been successfully applied to text categorization but seldom applied to text clustering due to the unavailability of class label information. In this paper, we first give empirical evidence that feature selection methods can improve the efficiency and performance of text clustering algorithm. Then we propose a new feature selection method called “Term Contribution ...

متن کامل

Natural scene text localization using edge color signature

Localizing text regions in images taken from natural scenes is one of the challenging problems dueto variations in font, size, color and orientation of text. In this paper, we introduce a new concept socalled Edge Color Signature for localizing text regions in an image. This method is able to localizeboth Farsi and English texts. In the proposed method rst a pyramid using diff...

متن کامل

Iterative Double Clustering for Unsupervised and Semi-Supervised Learning

We present a powerful meta-clustering technique called Iterative Double Clustering (IDC). The IDC method is a natural extension of the recent Double Clustering (DC) method of Slonim and Tishby that exhibited impressive performance on text categorization tasks [12]. Using synthetically generated data we empirically find that whenever the DC procedure is successful in recovering some of the struc...

متن کامل

Self-Taught convolutional neural networks for short text clustering

Short text clustering is a challenging problem due to its sparseness of text representation. Here we propose a flexible Self-Taught Convolutional neural network framework for Short Text Clustering (dubbed STC2), which can flexibly and successfully incorporate more useful semantic features and learn non-biased deep text representation in an unsupervised manner. In our framework, the original raw...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

ITSA * : An Effective Iterative Method for Short-Text Clustering Tasks

نویسندگان

چکیده

منابع مشابه

A Joint Semantic Vector Representation Model for Text Clustering and Classification

An Evaluation on Feature Selection for Text Clustering

Natural scene text localization using edge color signature

Iterative Double Clustering for Unsupervised and Semi-Supervised Learning

Self-Taught convolutional neural networks for short text clustering

عنوان ژورنال:

اشتراک گذاری